Defend against ACP server reporting used > size by timvisher-dd · Pull Request #364 · xenodium/agent-shell

timvisher-dd · 2026-03-03T19:46:25Z

The ACP server (claude-agent-acp) has a bug where model switches cause used to exceed size in session/update notifications. For example, switching from Opus 1M to Sonnet 200k drops size to 200000 while used keeps growing past it (observed: 419574/200000 = 209.8%). This results in nonsensical context indicators and percentages e.g. (from a real session)

 Context: 420k/200k (209.8%)
  Tokens: 32 in · 11k out · 11m cached (11m total)
    Cost: USD75.96

While I intend to get a fix for this improper reporting into claude-agent-acp, agent-shell should be robust against it. To that end, when the garbage usage data is observed, the UI now signals unreliable data instead of showing nonsense:

Context indicator shows ? with warning face when used > size
Formatted usage shows raw numbers with (?) instead of a bogus percentage
A regression test replays the real observed ACP traffic from the model-switch scenario so this class of bug is caught going forward

This also adds comprehensive ERT test coverage for agent-shell-usage.el: notification updates, indicator scaling/colors, compaction replay, token saving, and number formatting.

For the claude-agent-acp side of this see agentclientprotocol/claude-agent-acp#412

Test plan

All 21 ERT tests pass
checkdoc clean on both files
byte-compile clean on both files
Manual baking verification

CI workflow runs byte-compilation and ERT tests on push/PR using GitHub Actions with deps checked out from timvisher-dd/acp.el-plus and xenodium/shell-maker. bin/test parses ci.yml with yq so local runs stay in sync with CI automatically. It symlinks local dependency checkouts into deps/ to match the CI layout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New library for sending desktop notifications from Emacs. In GUI mode on macOS, uses native UNUserNotificationCenter via a dynamic module (agent-shell-alert-mac.dylib) compiled JIT on first use (inspired by vterm). When compilation fails (e.g. missing Xcode CLI tools), a message recommends `xcode-select --install`. In terminal mode, auto-detects the host terminal emulator and sends the appropriate OSC escape sequence: - OSC 9: iTerm2, Ghostty, WezTerm, foot, mintty, ConEmu - OSC 99: kitty - OSC 777: urxvt, VTE-based terminals Inside tmux, wraps in DCS passthrough (checking allow-passthrough first). Falls back to osascript on macOS when the terminal is unknown or tmux passthrough is not enabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Follows the same pattern as acp.el: a boolean toggle (agent-shell-logging-enabled, off by default), a per-shell log buffer stored in state, and label+format-string logging. Adds log calls to idle notification start/cancel/fire for observability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

After each agent turn completes, a 30s timer starts. Any user input in the buffer cancels it; otherwise it fires a desktop notification via agent-shell-alert. The echo area message is only shown when the shell buffer is not the active buffer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Validate that agent-shell-buffers and agent-shell-project-buffers reflect (buffer-list) ordering correctly: switch-to-buffer and select-window promote, with-current-buffer does not, bury-buffer demotes, and project filtering preserves order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fails PRs that modify .el files or tests/ without also updating README.org, ensuring the soft-fork features list stays current. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add comprehensive ERT tests for agent-shell-usage.el covering notification updates, context indicator scaling/colors, compaction replay, token saving, and number formatting. The ACP server has a bug where model switches cause used to exceed size in session/update notifications. Rather than clamping, signal unreliable data: indicator shows ? with warning face, format shows (?) instead of a bogus percentage. A regression test replays real observed traffic from the Opus 1M -> Sonnet 200k switch scenario. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The `usage_update` notification reports `size: 200000` even when the active model has a 1M context window (e.g. `opus[1m]`), causing clients to display incorrect context utilization (e.g. `689k/200k (344.3%)` instead of `689k/1000k (68.9%)`). Four bugs fixed: - **Min across all models**: The original code used `Math.min` across all `modelUsage` entries, so subagent models (Sonnet/Haiku with 200k windows) dragged down the reported size for the main Opus 1M model. Now tracks the top-level assistant model and looks up its context window specifically. - **Model name mismatch**: The SDK's streaming path keys `modelUsage` by the requested model alias (e.g. `claude-opus-4-6`) while `BetaMessage.model` on assistant messages has the resolved API response model (e.g. `claude-opus-4-6-20250514`). The exact-match lookup always missed, falling back to the hardcoded 200k default. Now falls back to prefix matching, preferring the longest/most-specific match. - **Synthetic messages corrupt model tracking**: `/compact` and similar commands emit assistant messages with `model: "<synthetic>"`. These were updating `lastAssistantModel`, causing the next `usage_update` to miss the `modelUsage` lookup and fall back to the 200k default. Now filters out `<synthetic>` models. - **Stale usage after compaction**: No `usage_update` was sent on `compact_boundary`, so clients kept showing the pre-compaction context size (e.g. `944k/1m`) right after "Compacting completed" until the next full turn. Now sends `used: 0` immediately on compaction. This is a deliberate approximation — the exact post-compaction size isn't known until the SDK's next API call, which replaces it within seconds. The alternative (no update) is worse UX: showing a full context bar right after compaction. Eight new tests cover: token sum correctness, current-model context window lookup, model switching, subagent isolation, prefix matching in both directions, and synthetic message filtering. Note: `bin/test` (local CI validation script) is cherry-picked from #353. Would fix `agent-shell`'s usage indicator which currently has to defend against this broken math: xenodium/agent-shell#364 ## Testing - [x] Manual riding with it for a bit. --------- Co-authored-by: Tim Visher <194828183+timvisher-dd@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Steffen Deusch <steffen@deusch.me>

timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from e528a97 to 3e99679 Compare March 3, 2026 19:49

timvisher-dd changed the title ~~Add usage tracking tests and fix existing test failures~~ Add usage tracking and context indicator regression tests Mar 3, 2026

timvisher-dd marked this pull request as ready for review March 3, 2026 20:08

timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from 3e99679 to 6d914d4 Compare March 11, 2026 17:31

timvisher-dd changed the title ~~Add usage tracking and context indicator regression tests~~ Defend against ACP server reporting used > size Mar 11, 2026

timvisher-dd marked this pull request as draft March 12, 2026 15:54

timvisher-dd mentioned this pull request Mar 12, 2026

fix: use current model's context window for usage_update size agentclientprotocol/claude-agent-acp#412

Merged

1 task

timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from 6d914d4 to c98e427 Compare March 13, 2026 01:20

timvisher-dd marked this pull request as ready for review March 14, 2026 22:00

timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch 2 times, most recently from e2ce5c0 to f64cdbd Compare March 14, 2026 22:02

timvisher-dd closed this Mar 14, 2026

timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from f64cdbd to 2f153b2 Compare March 14, 2026 23:53

timvisher-dd reopened this Mar 15, 2026

timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from 2f153b2 to 8b63348 Compare March 15, 2026 16:05

timvisher-dd and others added 10 commits March 15, 2026 14:49

Add soft-fork README header with features list

021807a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add CI check that README.org is updated when code changes

cc9ab0e

Fails PRs that modify .el files or tests/ without also updating README.org, ensuring the soft-fork features list stays current. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Backfill PR #3 entry in README.org features list

05ab3f0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add PR #5 entry to README.org features list

b9ca659

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from 8b63348 to b9ca659 Compare March 15, 2026 20:03

timvisher-dd mentioned this pull request Mar 15, 2026

Defend against ACP server reporting used > size timvisher-dd/agent-shell-plus#5

Merged

4 tasks

timvisher-dd closed this in timvisher-dd/agent-shell-plus#5 Mar 15, 2026

timvisher-dd deleted the timvisher-dd/tests/add-shell-usage-regression-tests branch March 15, 2026 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Defend against ACP server reporting used > size#364

Defend against ACP server reporting used > size#364
timvisher-dd wants to merge 10 commits intoxenodium:mainfrom
timvisher-dd:timvisher-dd/tests/add-shell-usage-regression-tests

timvisher-dd commented Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

timvisher-dd commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timvisher-dd commented Mar 3, 2026 •

edited

Loading